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TITLE 

"PROTEIN ENGINEERING" 
FIELD OF THE INVENTION 
THIS INVENTION relates to a method of identifying proteins 
suitable for protein engineering. In particular, the present invention 
relates to a computer database searching method of identifying proteins 
according to aspects of three-dimensional structure, and furthermore to 
the modification of proteins so identified to thereby possess one or more 
desired characteristics. Although not limited thereto, this invention relates 
to engineered proteins such as cytol<ine mimetics. 

BACKGROUND OF THE INVENTION 
Proteins are central to life due to their crucial involvement in 
a variety of biological processes, such as enzyme catalysis of 
biochemical reactions, control of nucleic acid transcription and replication, 
hormonal regulation, signal transduction cascades and antigen 
recognition during immune responses. 

In many cases, one or more structural regions of a protein 
are responsible for a particular function, hereinafter referred to as 
"functional regions". These regions may constitute the active site of a 
protein enzyme, the nucleic acid binding domain of a transcription factor, 
a region of a protein cytokine crucial to binding the specific receptor for 
that cytokine, or antigen-binding regions of antigen receptors. 

A functional region of a protein usually comprises one or 
more amino acids which are required for that particular function, that is, 
they are essential for that function. 

In many cases, although these required amino acid residues 
are topographically proximal to each other, they may be well separated 
with respect to primary amino acid sequence, that is, they are non- 
contiguous. In addition, where there is more than one functional region of 
a protein, these regions may also be topographically proximal, but well 
separated in terms of primary amino acid sequence. In some cases, 
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however, where there is more than one functional region involved in a 
particular function, these functional regions may also be topographically 
well separated. This is a particularly important point with regard to the 
functional regions of cytokines. 

"Cytokine" as used herein includes and encompasses 
soluble protein molecules which have a cognate cell surface receptor, 
and which are involved in initiating, controlling and otherwise regulating a 
variety of processes relevant to cell growth, death and differentiation. 
Cytokines are typically exemplified by interferons (e.g. IFN-y), interleukins 
(for example IL-2, IL-4 and IL-6), growth and differentiation factors [e.g. 
granulocyte colony stimulating factor (G-CSF) and erythropoietin (EPO)] 
and others such as growth hormone (GH), prolactin, TGF-(3, tumour 
necrosis factor (TNF) and insulin. Each of these molecules is capable of 
binding a specific receptor and thereby eliciting a particular biological 
response or set of responses. 

The fact that a particular function of a protein can be 
attributed to one or more functional regions of that protein has formed the 
basis for strategies aimed at modifying a protein by adding or subtracting 
functional regions to modify the function of that protein. 

In this regard, the design and engineering of cytokine 
mimetics has become an area of major importance, as many cytokine- 
cytokine receptor interactions are central to the regulation of a variety of 
biological processes, it is envisaged that new mimetics will therefore 
become important new therapeutic agents that either mimic or inhibit the 
biological response to cytokine-cytokine receptor interactions. 

A "mimetic" is a molecule which elicits a biological response 
either similar to, or more powerful than, that of another molecule (an 
"agonist"), or inhibits the action of the other molecule (an "antagonist"). 
The other molecule may be a cytokine, for example. 

With regard to designing and engineering mimetics based 
on cytokines, a problem frequently encountered with many engineered 
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f mimetics has been that they exhibit short biological half-lives and hence 
minimal bioavailability and eificacy. In this regard, it has been proposed 
that small cysteine-rich proteins might be useful as protein "scaffolds" as 
a basis for engineering mimetics, due to their stability (Vita et al., 1995, 
Proc. Natl. Acad. Sci. USA 92 6404). These small cysteine-rich proteins 
comprise a disulfide-bonded core and exposed amino acid side chains at 
the protein surface (Neilsen et al., 1996, J. Mol. Biol. 263 297). However 
the full potential of these proteins has not been realized due to the fact 
that typical prior art strategies for protein engineering have largely been 
limited to transferring or exchanging contiguous groups of amino acids 
within individual secondary structural elements, such as loops or helices 
or p-sheets and no design strategies exist for selecting the most 
appropriate disulfide-rich candidiate. 

Examples of such an approach would include: the exchange 
of secondary structural regions between RNase and angiogenin, either to 
confer RNase activity on angiogenin (Harper et al., 1989, Biochemistry 28 
1875) or angiogenic activity on RNase (Raines et al., 1995, J. Biol. Chem. 
27017180); the insertion of elastase inhibition activity into IL-lp by 
transfer of the protease inhibitor loop of etastase to the IL-ip scaffold 
(Wolfson et al., 1993. Biochemistry 32 5327); the insertion of a 10 amino 
acid calcium-binding loop of thermolysin into Bacillus subtilis neutral 
protease (Toma et al., 1991, Biochemistry 30 97); the insertion of a p- 
sheet from a snake toxin to replace the p-sheet of charybdotoxin 
(Drakopolou et al., 1996, J. Biol. Chem. 271 11979); and the 
incorporation of a p-sheet from carbonic anhydrase into the p-sheet of 
charybdotoxin (Pierret et al., 1995, J. Med. Chem. 35 2145). 

Of growing importance in protein engineering has been the 
use of computer based technology combined with the elucidation of the 
3D structures of small molecules and macromolecules. 3D molecular 
structures are being generated at an increasing rate, such as by X-Ray 
crystallography and NMR techniques. These 3D features can be stored in 
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generally accessible, searchable databases, such as the BROOKHAVEN 
database. 

For the purposes of this specification, a database will 
comprise a collection of "enfr/es", each entry corresponding to a 
representation of an aspect of 3D structure of a framework protein. A 
framework protein is simply any protein for which a 3D structure exists, 
either by experimental elucidation or by predictive means such as 
computer modelling. A framework protein is potentially useful as a 
scaffold which can be structurally modified for the purposes of imparting a 
particular function thereto. 

A "Qae/y" refers herein to a representation of an aspect of 
3D structure of a protein which exhibits a function of interest. The 
representation of 3D structure would be in a form suitable for searching a 
database with the intention of identifying a "hit". A hit is an entry identified 
according to the particular query and the algorithm used to perform the 
search. 

An important advance in database searching has been 
made by representing 3D structures in terms of the relationship between 
atoms located in "distance space", rather than "Cartesian space" (Jakes & 
Willett, 1986, J. Mol. Graphics 4 12; Ho & Marshall, 1993, J. Comp. 
Aided. Mol. Des. 7 3). A location in Cartesian space is defined by three 
coordinates (x. y, z) which each correspond to a position along three 
respective axes (X. Y, Z), each axis being oriented at right angles to the 
other two. 

A location in distance space, however, is defined by 
distances between atoms, expressed in the form of a distance matrix, 
which details the distance between atoms. Distance matrices are 
therefore coordinate independent, and comparisons between distance 
matrices can be made without restriction to a particular frame of 
reference, such as is required using Cartesian coordinates. 

it is important to emphasise that an arrangement of atoms 
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and its mirror image are described by identical distance matrices. A root 
mean squared (RMS) difference can be used to alleviate this ambiguity. 

With regard to the 3D structure of proteins, a simplification 
of protein structure can be provided by reducing a 3D structure to "Ca-Cp 
vectors" as discussed in McKie et al., 1995, Peptides: Chemistry, 
Structure & Biology p 354-355. A Ca-Cp vector occupies a location in 3D 
space, the location being defined by the orientation of the covalent bond 
betv^een the a carbon and (3 carbon atoms of an amino acid (Laurl & 
Bartlett, 1994, J. Comp. Aid. Mol. Des. 8 51). It wW be appreciated that 
each of the 20 naturally-occurring constituent amino acids of a protein 
(except glycine), possess a Ca-C(3 vector due to the covalent bond 
between the "central" a carbon and the p carbon of the constituent side 
chain. 

For those proteins containing Gly in the database, it is 
possible to mutate this to Ala to generate the required Ca-Cp vector for 
database searching. 

The usefulness of Ca-Cp vectors is that they provide a 
simplification of 3D structure. Therefore, only the amino acid side-chains 
of a functional region of a protein need be represented by the Ca-Cp 
vector map, thereby excluding the substantial portion of the protein(s) not 
directly involved in that particular function. For the purposes of database 
searching, Ca-Cp vectors are ideal, as they constitute the basic 3D 
structural information needed. 

After identification of Ca-Cp vectors corresponding to a 
protein or a functional region thereof, the parameters that characterize 
each vector must be stored in a database in such a way that retrieval in 
response to a query can be made quickly. A number of options are 
available for suitable representation of Ca-Cp vectors, v^^hether as a 
database entry or as a query: - 

(A) as a distance matrix; 

(B) as a dihedral angle (6) formed between respective 
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Ca-Cp vectors; 

(C) as angles ai and formed between respective Ca- 
Cp vectors. 

A simple explanation of these representations is provided in 
Laurl & Bartlett, 1994, supra, which is hereinafter incorporated by 
reference. The key to successful database searching is speed and 
efficiency. Thus, computer search algorithms have been developed which 
use a strategy whereby the vast majority of entries in the database are 
eliminated in a preliminary screening step. 

These algorithms are demanding of computer resources, 
and therefore a search is normally effected in two stages: - 

(1) a screening search to eliminate entries that cannot 
possibly constitute a hit; and 

(2) an atom-by-atom comparison of a query with each 
entry not eliminated in (1), to identify one or more 
hits. 

The search in (1) could screen entries based on geometric 
attributes of the query (Lesk, 1979, Commun. ACM 22 219) interatomic 
distances and atom types (Jakes & Willett. 1986, supra), aromaticlty. 
hybridization, connectivity, charge, position of lone pair electrons, or 
centre of mass of ring structures (Sheridan et al., 1989, Proc. Natl Acad. 
Sci. 86 8165). This screening process would eliminate entries that have 
no chance of meeting the 3D constraints of the query. 

This strategy, although quick, requires that for an entry to 
register as a hit, it must comprise every specified query component. As 
the number of query components increases, the number of near misses 
increases and the likelihood of finding a hit decreases. 

A more useful search strategy which assesses the relative 
merits of each near miss as well as each hit has recently been provided 
by the search program FOUNDATION (Ho & Marshall, 1993, supra). 
FOUNDATION uses a clique-detection algorithm (various algorithms are 
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reviewed and compared in Brint & Willett, 1987, J. Mol. Graphics 5 49 
and Brint & Willett, 1987, Chem. Inf. Comput. Sci. 27 152) which searches 
a 3D database of entries for a user-defined query consisting of the co- 
ordinates of various atoms and/or bonds of a 3D structural feature. 
FOUNDATION identifies all possible entries that contain any combination 
of a user-specified minimum number of matching atoms and/or bonds as 
hits. 

Despite the usefulness of 3D database searching as a 
means of identifying structurally related proteins, this approach has not 
been well utilized with respect to engineering proteins with a desired 
function. 

OBJECT OF THE INVENTION 
The present inventors have recognized that 3D database 
searching is useful for identifying proteins which have one or more 
desired structural features, such proteins being candidate "frameworks" 
for the subsequent engineering of proteins with desired characteristics or 
functions. Furthermore, the present inventors have realized that protein 
engineering is best achieved by modification of a framework protein to 
incorporate particular amino acid residues required for a characteristic, 
property or function, rather than by incorporating entire elements of 
secondary structure such as loops or helices. This is particularly 
applicable when functionally important amino acids are scattered 
throughout a protein and are not confined to particular regions of primary 

or secondary structure. 

It is therefore an object of the present invention to provide a 

novel method of protein engineering. 

SUMMARY OF THE INVENTION 
In one aspect, the present invention resides in a method of 
protein engineering including the steps of:- 

(i) creating a computer database which includes a 
plurality of entries, each said entry corresponding to 
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a description of a location and orientation in 3D 
space of side chains of amino acid residues of a 
framework protein, wherein the location and 
orientation of each side chain is simplified as a Ca- 
Cp vector; 

(11) creating a query corresponding to a description of a 
location and orientation in 3D space of respective 
side chains of two or more amino acid residues of a 
sample protein which are required for a function of 
said sample protein, wherein the location and 

: : r-^^.v ^'^^ chain is . simplified as. a Ca- 

Cp vector; and 

(iii) searching said database with said query to thereby 
identify one or more hits wherein at least one of said 
hits corresponds to a respective said framework 
protein which has structural similarity to said sample 
protein. 

Preferably, the framework protein is capable of internal 
disulfide bond formation. More preferably, the framework protein is a 
small cysteine-rich protein comprising 70 amino acids or less, having 2- 
1 1 disulfide bonds. 

in another aspect, the present invention provides a method 
of protein engineering including the steps of> 

(i) creating a computer database which includes a 
plurality of entries, each said entry corresponding to 
a description of a location and orientation in 3D 
space of amino acid residues of a framework protein 
capable of internal disulfide bond formation; 

(ii) creating a query corresponding to a description of a 
location and orientation in 3D space of two or more 
amino acid residues of a sample protein which are 
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required for a function of said sample protein; and 
(iii) searching said database with said query to thereby 
identify one or more hits wherein at least one of said 
hits corresponds to a respective said framework 
protein which has structural similarity to said sample 
protein. 

Preferably, the framework protein is a small cysteine-rich 
protein comprising 70 amino acids or less, having 2-1 1 disulfide bonds. 

Preferably, the location and orientation of each amino acid 
side-chain of said framework protein and said sample protein is 
.represented^by.a 

In one embodiment applicable to the first- and/or second- 
mentioned aspects, the method includes the step of modifying an amino 
acid sequence of said framework protein which corresponds to a hit, by 
substituting at least one amino acid residue thereof with at least one 
amino acid residue of said sample protein. 

Preferably, said at least one amino acid residue of said 
sample protein represents at least a portion of a functional region of said 
sample protein. 

More preferably, at least two of the amino acid residues of 
said sample protein which substitute amino acid residues of said 
framework protein are non-contiguous in primary sequence. 

Preferably, the modified framework protein has greater 
stabiHtyrthan saidtsampJe-'pjiolein. 

Preferably, the framework protein so modified has increased 
structural similarity to said sample protein. 

Advantageously, the modified framework protein is capable 
of exhibiting a function which is either similar to, or inhibitory of. a function 
of said sample protein. 

In one embodiment, said sample protein is a cytokine 
selected from the group consisting of GH. IL-4, IL-6 and G-CSF. 

In yet another aspect, the invention provides an engineered 
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protein comprising 70 amino acid residues or less of a framework protein 
and 2-11 disulfide bonds of said framework protein, together with at least 
two amino acids of another protein which are non-contiguous in primary 
sequence and which represent at least a portion of a functional region of 
said another protein. 

Preferably, the engineered protein has greater stability than 
said another protein. 

More preferably, the engineered protein exhibits a function 
either similar to, or inhibitory of. said another protein. 

In one embodiment, said another protein is a cytokine 
. selected from. the group consisting of GH, IL-4, IL-6 and G-CSF, 

In a particular embodiment, the engineered protein has an 
amino acid sequence selected from the group consisting of the amino 
acid sequences of SCY01, SCY02, SCY03, ERP01, ERP02. ERP03 and 
VIB01. 

In still yet another aspect, the present invention resides in a 
computer program for searching a protein structure database. 

In one embodiment, the computer program is for searching a 
protein database comprising a plurality of entries, each said entry 
corresponding to a distance matrix representation of two or more Ca-Cp 
vectors, said program including the steps of: 

(i) comparing a query with each said database entry, 
said query corresponding to a distance matrix 
representation of two or more Ca-Cp vectors; and 

(ii) identifying hits by clique detection, wherein a hit is 
defined according to a minumum number of Ca-Cp 
vector matches between said query and each said 
entry. 

Throughout this specification and claims which follow, 
unless the context requires otherwise, "comprise", "comprises" and 
"comprising" are used inclusively, so that a stated integer or integer group 
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does not exclude other integers or integer groups. 

It will also be appreciated that throughout this specification 
and claims, scientific terms are to be given their usual scientific meaning, 
although certain terms are defined herein to assist interpretation by the 
skilled person. 

BRIEF DESCRIPTION OF THE FIGURES AND TABLFS; 
Table 1: An example of a query file which defines the query Ca-Cp 

vectors, the tolerance for each query atom and the definition 

of a subset. 

Table 2: Blood serum stability test results of a solution of SCY01 . 

Table 3: Enzyme stability test results of a solution of S.CYO-1 

FIG. 1: Amino acid sequences of the hGH high affinity site 
antagonist framework scyllatoxin, the hGH antagonists 
SCY01, SCY02, SCY03 and their alignment with the hGH 
sequence. Disulfide linkages are indicated by lines 
connecting cysteines. 

FIG. 2: Amino acid sequences for the hGH agonist framework VIB, 
the engineered molecule VIB01 and the alignment with the 
hGH sequence. Disulfide linkages are indicated by lines 
connecting cysteines. 

FIG. 3: Comparision of the hGH structure with hGH agonist 

molecule VIB01 showing the very high degree of overlap of 
the alpha helices. 

FIG. 4: Schematic oveF/iew-ofedatatDBse'SBBr^ing si^^ 

FIG. 5: Two-dimensional depiction of three different representations 

of a pair of Ca-Cp vectors: d = interatomic distance as used 
to construct distance matrices; 5 = dihedral angle; and 
angles. 

FIG. 6: Circular dichroism spectra of SCY01 showing little change 

in the structure on temperature changes or on the addition 
of helix stabilizing agent Trifluroethanol. 
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FIG. 7: Structure of the engineered SCY01 molecule shown in 

comparision with the native scyllatoxin molecule. 
FIG. 8: Biological effect of SCY01 on BaF3 cell proliferation by 

inhibiting the growth response of the cells to 0,5 ng/mL 

hGH, but not to 50 U/mL IL-3. 
FIG. 9: Amino acid sequence for the low affinity site hGH 

anatagonist framework ZDC and the engineered hGH 

anatagonist ZDC05 and the aligned hGH sequence. 

Disulfide linkages are indicated by lines connecting 

cysteines. 

FIG- 10: „ Circular dichrpjsm. spectra of VIB^^^^ 

Fig. 11: Amino acid sequences of the hGH agonist framework ERP, 
the engineered molecules ERP01, ERP02, ERP03 and their 
alignment with the hGH sequence. Disulfide linkages are 
indicated by lines connecting cysteines. 

FIG,12: Circular dichroism spectra of ERP03 showing little change 
in the structure on temperature changes or on the addition 
of helix stabilizing agent TrifluroethanoL 

FIG.13: Comparison of secondary Ha shifts for ERP01 and ERP03 
showing substantially identical structure and disulphide 
connectivities. The shaded bars show the invarient 
residues of the native ERP molecule. = ERP03 5HA; 
= ERP 5HA. 

FIG. 14: AmiTTO acid sequences of the CD4 frameworks PTA and 
SCY, the engineered molecules PTA CD4, and SCY CD4 
and the alignment with the CD4 sequence. Disulfide 
linkages are indicated by lines connecting cysteines. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

It will be appreciated that the present invention is 
predicated, at least in part, on the present inventors' realization that in 
order to identify framework proteins suitable for further modification by 



AMENDED SHEET 
IPEA/AU 



Received 6 June 2000 

13 

protein engineering, it is advantageous to search databases according to 
the orientation in 3D space of constituent amino acid side-chains of the 
framework protein, with respect to constituent amino acid side-chains of 
the sample protein which is the subject of the query. Framework protein 
"hits" so identified suitably share similarity, such as in terms of 
topography and chemistry, to the sample protein "query", and as such 
may be suitable candidates for further modification, A particular aspect of 
the present invention is that a modified framework protein may display 
one or more desired characteristics, such as increased stability and in 
some cases a function similar to or inhibitory of the sample protein. 

- R®f®'"n,f^.g J?^^^^^ second- mentioned 

aspects, preferably, each said entry corresponds to a description in the 
form of a distance matrix representation of said Ca-Cp vectors. 

Alternatively, said Ca-Cp vectors may be represented by dihedral 
angles or ct^ and a2 angles. 

As used herein, "protein" and "polypeptide" are used 
interchangeably with regard to amino acid polymers. A "peptide" is a 
protein which has no more than fifty (50) amino acids. 

As used herein, a ""framework protein" is any protein which 
exhibits one or more desired structural features which provide 
advantages which include size, solubility and/or stability. "Stability in this 
context includes resistance to degradation by proteolytic enzymes and/or 
temperature variation and/or resistance to denaturation by chaotropic 
agents and/or denaturing detergents, changBsin-:p'H, pH^extremes-, arrd/or 
REDOX extremes and/or changes. 

The framework protein may be capable of internal disulfide 
bond formation. Preferably, said framework protein comprises 70 amino 
acids or less, having 2-11 disulfide bonds, which is an example of "a 
small cysteine-rich protein". 

The amino acids used for creating each said entry may 
include some or all of the constituent of amino acids of the framework 
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protein. 

As used herein, a ''sample protein" is a protein which has 
one or more functional characteristics of interest which render it desirable 
for the purposes of protein engineering. 

Suitably, the sample protein may be an enzyme, nucleic 
acid-binding protein, cytokine, antigen, receptor, ion channel, chaperonin, 
or any protein with a function of interest. 

In an embodiment, said sample protein is a cytokine 
selected from the group consisting of GH. IL-4. G-CSF, IL-6 and EPO. 

Preferably, said function of said sample protein comprises 
binding a specific receptor to thereby elicit a biological response. 
However, a variety of other functions contemplated, such as cataiysis, 
binding cations (Zn**, Ca"". Mg"^*), transporting ions (e.g. CI*. K*. Na*), 
binding lipids, binding nucleic acids as a means of transcriptional 
regulation or regulating DMA replication, assisting protein folding and 
transport, and any other function carried out by proteins. 

With regard to creating a query, it is preferred that each 
said query corresponds to a description in the form of a distance 
matrix representation of Ca-Cp vectors. However, other representations 
such as dihedral angles or and angles may also be applicable. 

Preferably, said computer program used for searching said 
database is the VECTRIX program, as will be described in detail 
hereinafter. VECTRIX incorporates the FOUNDATION algorithm (Ho & 
Marstnall, 1993:, supra, which is herein incorporated by reference). 
Program FOUNDATION searches 3D databases of small organic 
molecules to identify structures that contain any combination of a user- 
specified minimum number of matching elements of a user-defined query. 
It achieves this by first using a distance matrix to define the topography of 
the query atoms, followed by screening using various query constraints 
which define the chemical nature of the structure. The topology of the 
atoms in the structure are again represented using a distance matrix. 
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Structural fragments in the database, whose distance description matches 
those of the query are identified using graph theory (Gibbons Algorithmic 
Graph Theory; Cambridge University Press: Cambridge, 1988). 

In graph theory, a graph is a structure comprised of nodes 
5 (vertices) connected by edges. A graph is completely connected when all 
nodes are connected to one another. A subgraph is any subset of a 
larger graph. The largest completely connected subgraph of any graph is 
called a clique. Thus, the query is a completely connected graph, as all 
interatomic distances are determined in the distance matrix. The task is 
5 10 to search a structural database to find all cliques that contain at least 

• a user-defined number of matching nodes. 

P " " ' ' There are many clique-finding algorithms. Some of "the well 

g known procedures include those by Bonner, 1964, IBM J. Res. Develop., 

8 22; Gerhards & Lindenberg, 1981, Computing 27 349 and Bron & 
15 Kerbosch, 1973, Commun. ACM 16 575. Computational chemists have 



SI 

ni 

CI 



20 



25 



30 



AMENDED SHEET 
IPEA/AU 



wo 00/23474 PCT/AU 99/009 14 

15 

adapted these algorithms or implemented similar ideas to facilitate 
searching for 3D structures within databases (Kuntz et al., 1982, J. Mol. 
Biol. 161 269; DesJarlais et al., 1988, J. Med. Chem. 31 722; DesJarlais 
et al., 1990, Proc. Natl. Acad. Sci. 87 6644; Crandell & Smith, 1983, J. 
Chem. Infr. Comput. Sci. 23 186; Brint & Willett, 1987, J. Mol. Graphics 5 
49-56; Kuhl et al., 1984, J. Comput. Chem. 5 24 and Smellie et al., 1991, 
J. Chem. inf. Sci. 31 386). 
Computer Database Searching 
VECTRIX 

The present inventors have created a program "VECTRIX", 
which is a modified version of the clique-detection algorithm in program 
FOUNDATION as described by Ho & Marshall, 1993, J. Comp. Aided. 
Mol. Des. 7 3-22. The search procedure is illustrated in Scheme A. The 
major changes in comparison to Ho & Marshall, 1993, supra indude:- 
• the query and database structures are both proteins; 

the query elements are a distance matrix defining the topography 
of Ca-Cp vectors, not individual atoms as in FOUNDATION; 
similarly, the database structure is defined as a Ca-CP vector distance- 
matrix and not every atom as in FOUNDATION; 

in FOUNDATION, a pair of atoms in a query is considered to match 
with a pair of atoms in an entry in the database if the atom-type 
and the distance between them are matched; in VECTRIX, a pair of 
Ca-Cp vectors in a query is considered to match with a pair of Ca- 
Cp vectors in an entry in the database if the four distances (Caj- 
Cos ; Ca,-Cp2 ; CprCo^ ;Cp, -Cp^) between the pairs are matched; 
and 

the FOUNDATION program performs the clique detection, steric 
filtering and subset filtering together and outputs the hits that 
satisfy the three criteria; by design, the VECTRIX program output 
all hits that have number of matches greater than or equal to 
MIN MATCH. POSTVEC is then used to filter those hits based on 
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stehc filtering, a new MIN_MATCH and subset consideration; by 
separating the clique detection hits and the filtering process, the 
VECTRIX program is more flexible. 

An outline of a program written by the present inventors is shown in 

Scheme A. 

The VECTRIX program requires four parameters: (1) 
query.file (2) database.file; (3) steric.file and (4) MIN_MATCH. The 
parameters are described in detail below. 
(1) query.file 

query.file (for example as in Table 1) contains the definition 
of the query, the definition of tolerance for each query atom and the 
definition of SUBSET. The three definitions are described below:- 
Query definition: Prior to running the VECTRIX program, a particular 

target protein is selected. The target proteins three- 
dimensional structure must have been determined by 
experimental or theoretical means well known in the 
art. The functional amino acids of the target protein 
must be defined and the Ca-Cp vectors for those 
functional residues extracted to the query.file. Table 
1 shows the definition of Ca-Cp vectors of four 
functional residues. The numbers in column 7, 8 and 
9 represent the x, y and z coordinates of the vectors 
respectively. 



Tolerance 
definition: 



The tolerance defines the allowable uncertainty in the 
orientation of each atom. Note that the final 
tolerance of a vector from atom A to atom B is the 
sum of the individual tolerance of atom A and B. In 
Table 1, the tolerances for individual atoms are 
defined in column 10 to be 0.5 A, so the tolerance for 
a distance between two atoms is 1 .0 A. 
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Subset definition: A list of atoms can be grouped into a SUBSET. The 

query file allows for the definition of as many 
SUBSETS as are required. The SUBSET definition 
will be used in the POSTVEC program to filter the hits 
to obtain more relevant hits. In Table 1, the 1^* 
SUBSET command is defined as subset 1 and it 
consists of Ca-Cp vector numbers 1, 3 and 4. The 
2"'' SUBSET command is defined as subset 2 and it 
consists of Ca-Cp vector number 2. 

(2) database.file 

database.file contains a list of file names that correspond 
with the entries constituting the database. 

(3) steric.file 

steric.file contains the coordinates of the grid points 
representing the ligand or receptor space. There are two forms of steric 
filtering depending on the availability of 3D structure of a receptor or 
ligand. If the structure of the receptor is known and a query is from the 
Ca-Cp vectors corresponding to the receptor-binding amino acid side 
chains of a ligand, then a hit must be evaluated in terms of whether it 
would invade the 3D space accessed by the receptor upon binding a 
cytokine, for example (receptor-based filtering). Moreover, if the structure 
of the ligand is known and a query is from the Ca-Cp vectors 
corresponding to the receptor-binding amino acid side chains of a ligand, 
then a hit must be evaluated in terms of whether it would invade the 3D 
space not occupied by the ligand (ligand-based filtering). The mode is 
identified in the first line of the 'steric file'. The first step in our steric 
filtering algorithm is the calculation of the grid points that represent the 
ligand or receptor 3D space using the program 
PREPARE_STERIC_FILTER. The program first defines the limits of the 
structure via determining the maxima and minima in the x, y and z 
dimension. Then for each grid points (1 A apart) within the limit, a xyz 
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coordinate is output to a 'steric file' if the point is in steric contact with the 

receptor or the ligand. 

(4) MIN_MATCH 

MIN_MATCH is an integer defining the minimum number of 
Ca-Cp vectors that match between the query and the entry in the 
database required before VECTRIX will consider a clique as a hit. 

Having entered the appropriate parameters, the first general 
step of the VECTRIX program is to calculate the distance matrix of the 
Ca-Cp vectors of the query (see SCHEME A). Each database entry is 
now read in turn and the Ca-Cp distance matrix of the framework protein 
is calculated. The clique detection algorithm of Ho & Marshall, 1993, 
supra, is used to identify geometric matches between the query and the 
database entry. If no match is found, another database file is read and 
processed. If a Hit is found, then some further processing is required 
because clique detection algorithm only finds the entries with Ca-Cp 
vectors that match those in the query. It does not check for steric 
integrity, that is, the structural complementarity that each hit possesses 
with regard to the 3D space in which it must reside. The VECTRIX 
program uses the 'steric file' to calculate the number of atoms in the hit 
which invade the receptor space or the non-ligand space depending 
whether it is in receptor-based or ligand-based filtering mode. Some 
parts of the framework protein are not essential to binding to the target 
protein via the 'matched" functional residues. The non-essential part 
includes the side chains that are not in the matches, the N- or C-terminal 
residues, up to the matched residue or the first cysteine residue. The 
essential atoms of a residue are the backbone atoms (N, H, CA, HA, C, 
O) and the side chain atoms that are attached to the CA atom (CB, 1HA 
and 2HA). The essential residues are between the first and the last 
cysteine. If no cysteine is found in the protein, the essential residues are 
defined to be between the first and the last matched residues. The 
VECTRIX program counts and outputs the number of essential atoms as 
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well as the number of essential atoms that invade the receptor or non- 
llgand space. Furthermore, for each subset of vector defined in the query 
file, the VECTRIX program counts and outputs the number of matched 
vectors in the subset. The results are written to an output file and another 
database entry is read and the process repeated until the end of the 
database is reached. 
POSTVEC 

By design, the VECTRIX program outputs all hits that have 
a number of matches greater than or equal to MIN_MATCH. The 
POSTVEC program is written for post VECTRIX filtering. The filtering is 
based on the steric contact, a new number of matches and the count of 
match in each SUBSET defined in the query.flle. The POSTVEC program 
requires at least three parameters, i.e. 

postvec vectrix_out.file min_match maxJnvade_fraction 
<subset1_num> <subset2_num>.. . <subsetX_num> 

where: 

the vectrix_out.file is the name of the vectrix output file. 
Min_match represents the new minimum number of matches 
required. 

the Maxjnvadejraction defines the maximum allowable 
fraction of invasion of receptor/nonjigand space. That is, 
hits are rejected if the fraction of invasion is greater than the 
max_invade_frac. e.g. 0.1 for 10%. 

Subset1_num represent the number of matches required for 
subset 1 . 

Subset2_num represent the number of matches required for 
subset 2. 

the bracket <> denote optional parameters. That is, Subset 
parameters are optional, if they are not defined then there is 
no subset filtering. 

The output of POSTVEC are pdbfiles of the filtered hits. These pdb files 
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are in the same frame of reference as the query files, enabling simple 

display and comparison. 

EXAMINATION OF HITS USING INSIGHT II 

An Insight II macro, EXAMINE_HIT.BCL, was written to 
enable easy viewing of the hits obtained from POSTVEC. Before using 
EXAMINE_HiT.BCL, an Insightll .psv file, EXAMINE.PSV, must be 
created. This file contains the ligand or the receptor in the same 
reference coordinate as the query vectors. It is used as the background 
to display the hits. Normally the ligand/receptor are set to dull colours 
and the query vectors are highlighted with thick lines, Ca coloured red, 
and Cp coloured yellow. In Insight II, sourcing the EXAM1NE_H1T.BCL 
file will allow for visualisation of the hits through the next and previous 
button, or through clicking on the filename of the hit. The hits are 
displayed together with the query and the receptor/I igand. Steric contacts 
and matched vectors are highlighted. 

An alternative representation of the VECTRIX program is 

shown in Scheme B. 

Alternatively, other applicable clique detection algorithms 
are provided by Brint & Willett. 1987. J. Mol. Graphics, supra and Brint & 
Willett. 1987, Chem. Inf. Comput. Sci. supra, which are hereinafter 

incorporated by reference. 

Using a series of automated scripts outlined in Scheme C, 
the database of small cysteine rich proteins is updated weekly by 
searching the Brookhaven database for suitable candidates. 

Suitably, said one or more hits correspond to respective 
entries identified by said algorithm according to said query. 

Should there be more than one hit, it is desirable to evaluate 
and rank each hit. The most important factor in evaluating hits is "steric 
integrity", or the 3D structural complementarity of a hit when compared to 
a query. Several algorithms have been developed which could be utilized 
for this purpose. Such algorithms would include an algorithm used by the 
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FOUNDATION program, algorithms which check van der Waals overlap 
of each said hit with said query (Allinger et ai, 1972, supra, which is 
herein incorporated by reference), or algorithms which calculate volume 
in common and volume of extra space with respect to each said hit and 
said query (Marshall et ai, 1979, supra, which is herein incorporated by 
reference). 

It is also contemplated that other algorithms may be useful. 
For example, simple distance calculations between said hit and said 
query after superimposition thereof may be used to identify 3D spatial 
differences therebetween. 

An outline of the process that is currently used for scoring is 
given in Scheme D. These procedures post process output data from the 
POSTVEC program, and these procedures may eventually be 
incorporated into the program to provide a semi-automated process. In 
the current filtering process, steps 1 and 2 evaluate the conformational 
stability of the engineered hit, and step 3 provides optimization of the fit 
between a receptor and hit. Note that this filtering process is described 
with reference to scoring hits in terms of their predicted interaction with a 
receptor eg. a cytokine and cytokine receptor. One skilled in the art will 
realize that the principles outlined in Scheme D are applicable to any 
protein-protein interaction. For example, when a crystal structure Is not 
known, scoring procedures can be implemented to ensure that the hit is 
subsumed by the steric surface of the ligand. 

It is also envisaged that evaluation and ranking of each said 
hit may be achieved manually by a person skilled in the art. although this 
would be a less preferred method, particularly when there is a plurality of 
hits to be evaluated and ranked. 

In light of the foregoing, the skilled person will understand 
that the method of the invention provides framework protein "hits" which 
may be the subject of further modification. 

As used herein in this context, a framework protein hit has 
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"structural similarity" to a sample protein by virtue of possessing amino 
acid sequence similarity, topographical similarity and/or chemical 
similarity. For example, a framework protein "hit" has a surface 
topography and/or chemistry which is similar to that of a receptor-binding 
region of a cytokine. Substitution of framework protein amino acids by 
sample protein amino acids preferably increases the degree of similarity. 

Preferably, a framework protein identified as a hit has 
greater stability than the sample protein. 

As used herein in this context, ''stability" includes resistance 
to degradation by proteolytic enzymes and/or temperature variation 
and/or resistance to denaturation by chaotropic agents and/or denaturing 
detergents, changes in pH, pH extremes, and/or REDOX extremes and/or 
changes. 

It will be appreciated that the said two or more amino acids 
used for creating a query at step (iii) of the method of the invention 
constitute at least a portion of one or more functional regions of said 
sample protein. These amino acids may be the same as, or different to, 
said at least one amino acid used in modifying the hit. 

In one embodiment, an amino acid sequence of a framework 
protein which corresponds to a hit is modified by substituting at least one 
amino acid residue thereof with at least one amino acid residue of said 
sample protein. Preferably, the said at least one amino acid of the sample 
protein is/are selected from those required for a function of said sample 
protein. This engineering process can involve addition, deletion or 
insertion of amino acids as desired. 

As already discussed, the purpose of such modification is to 
impart a particular property, characteristic or function to a framework 
protein. The method of the invention takes account of the fact that the 
amino acid residues essential to a particular function will often be non- 
contiguous with respect to primary sequence. These "scattered" amino 
acid residues may nevertheless form at least a portion of one or more 
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functional regions, each of which occupies a distinct location and 
orientation in 3D space. 

Advantageously, modification of the framework protein hit 
will be performed so as to effectively "transfer" one or more functional 
region(s) of the sample protein thereto. Transfer is achieved by 
incorporating amino acid residues from one or more functional regions (as 
hereinbefore defined) of the sample protein into an amino acid sequence 
of a framework protein. Such modification will be performed so as to 
engineer a protein which incorporates amino acid residues of said one or 
more functional region(s) appropriately located and oriented in 3D space. 

In an embodiment, said framework protein is modified to 
function as a cytokine mimetic. In this regard, modification of a framework 
protein may be performed so that said framework protein is capable of 
exhibiting a function similar to that of said sample protein (such as in the 
case of an agonist), or alternatively, so that it inhibits a function of said 
sample protein (such as in the case of an antagonist). 

However, the scope of the present invention extends to 
engineering proteins with any desired function by substituting amino acid 
residues of a framework protein. For example, an enzyme might be 
engineered to catalyze conversion of a substrate, or a transcription factor 
may be engineered to bind its cognate DNA sequence and to form 
complexes with other transcription factors necessary to promote 
transcription. 

In the case where a cytyokine mimetic is to be engineered, a 
suitable approach is to modify an amino acid sequence of a framework 
protein (corresponding to a hit) by substituting amino acid residue(s) 
thereof with amino acid residue(s) of said cytokine selected from those 
amino acid residues which are required for binding of said cytokine to a 
specific receptor. Often, a biological response is elicited by a cytokine 
binding to two or more receptor molecules, thereby cross-linking said 
receptor molecules. A cytokine antagonist is therefore engineered by 
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modifying a framework protein to include amino acid residues of a 
functional region required for binding one receptor molecule but not the 
other; an agonist is engineered by including amino acid residues of two 
functional regions, which together are required for binding and cross- 
linking of two receptor molecules. The functional regions required for 
binding said two receptor proteins occupy unique locations and 
orientations in 3D space. Engineering of an agonist therefore requires 
that the relative 3D location and orientation of each functional region is 
such that receptor binding and cross-linking is achievable. 

In addition to direct substitution of amino acid residues of 
said cytokine selected from those amino acid residues which are required 
for binding of said cytokine to a specific receptor, several other design 
processes may be used. In cases where the atomic structure of the 
sample protein and its receptor are known, de novo design programs 
such as X-SITE (Laskowski et aL, 1996. Journal of Molecular Biology, 
175; Bohm, 1992, J. Comput. Aided. Mol. Des. 6 69, which are herein 
incorporated by reference) may be used to guide engineering of auxilliary 
binding epitopes into the hit that modulate activity. The auxilliary binding 
epitopes may be natural or unnatural amino acids that may be conjugated 
to additional functionality such as protecting groups used in synthetic 

peptide chemistry. 

Programs that measure electrostatic similarity of mutated 
frameworks and the sample protein or electrostatic complementarity of the 
mutated framework and the sample protein receptor, such as DelPhi 
(Honig & Nicholls A, 1987, 'DelPhi\ Computer Program, Department of 
Biochemistry and Molecular Biophysics Columbia University, which is 
herein incorporated by reference), may be employed to determine 
unmutated areas of the mutated framework that may be deleterious to 
activity. 

Programs that measure buried surface areas, such as 
Naccess(Hubbard & Thornton. 1993, 'NACCESS', Computer Program, 
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Department of Biochemistry and Molecular Biology, University College 
London, which is herein incorporated by reference) may be used to 
analyse and compare the buried surface areas of the sample protein and 
the mutated framework. 
5 Often regions in proteins may be disordered and absent 

from the X-Ray or NMR structure. When residues are absent in the 
binding region of the sample protein, techniques such as homology 
modelling and loop searching may be employed to construct a complete 
Ci model of the atomic coordinates. 

10 Whichever approach is taken, modification of said ammo 

acid sequence of said framework protein requires that considerations of 
maintaining stereochemical and secondary structural integrity apply. It is 
therefore important to be able to predict any structural effects induced in 
said framework protein by such modification. This can be accomplished 
15 with algorithms well known to the art as described in Bowie ef a/., 1991, 
Science 253 164-170; Luthy et al.. 1992, Nature 356 83-85 and 
Laskowski et al., 1993, J. Appl. Cryst. 26 283-91. 

Preferably, a modified framework protein would be 
chemically synthesized. Alternatively, this may be achieved by chemically 
2 0 synthesizing a polynucleotide sequence which encodes an amino acid 
sequence of said modified framework protein. Techniques applicable to 
the chemical synthesis of proteins and nucleic acids are well known in the 
art, and an example of such a technique will be provided hereinafter. 

Alternatively, a polynucleotide sequence which encodes an 
amino acid sequence of a framework protein corresponding to said hit 
may be modified by in vitro mutagenesis techniques, resulting In a 
modified polynucleotide sequence encoding an amino acid sequence of 
said modified framework protein. Suitable in vitro mutagenesis techniques 
are well known in the art, such as described in Chapter 8 CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Ausubel ef al., Eds; John 
Wiley & Sons Inc., 1995). which Is herein incorporated by reference. 
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Phage display is also contemplated, which technique is well known in the 
art. An exemplary phage display method is provided in Smith et ai, 1998, 
J. Mol. Biol. 277 317, which is herein incorporated by reference. 

According to one embodiment of the invention, each said 
entry in the database corresponds to a small cysteine-rich protein of not 
more than 70 amino acid residues, initially represented in cartesian 
coordinate form, but subsequently processed into a distance matrix 
representation of Ca-CP vectors prior to searching. Said query is in the 
form of a distance matrix representation of Ca-CP vectors corresponding 
to amino acid side-chains of said sample protein, said amino acid side- 
chains being required for high-affinity binding of said sample protein to a 
receptor protein. In a particular embodiment, the sample protein is 
selected from group consisting of GH, IL-4, G-CSF and IL-6. 

In the case where said sample protein is human Growth 
Hormone (hGH), and said receptor protein is human Growth Hormone 
Receptor (hGHR), the Ca-Cp vectors of hGH are a simplification of the 
3D location and orientation of the amino acid side-chains of hGH which 
contact hGHR during high-affinity binding, and are required for such 
binding. 

0 In this case, said small cysteine-rich protein corresponding 

to a hit is scyllatoxin, the amino acid sequence of which (shown in FIG. 1) 
is modified so that a protein produced with that amino acid sequence is 
potentially capable of functioning as an hGH antagonist. The particular 
Ca-CP vectors used in the search process were Asp A171 ; Lys A172; Glu 

5 A174; Thr A175; Phe A176; Arg A178; lie A179; Lys A41; Leu A45; Pro 
A48; Glu A56; Arg A64; and Gin A68. The particular amino acid residues 
of hGH incorporated into the amino acid sequence of scyllatoxin were 
selected from those required for high-affinity binding of hGH to hGHR (as 
shown above) and which topographically matched with residues of 

0 scyllatoxin. Determination of which amino acids of scyllatoxin could be 
substituted without drastically affecting structural integrity was achieved 
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with the assistance of the INSIGHT II modelling program. 

The SCY01-SCY03 peptides, designed as potential hGH 
antagonists, were chemically synthesised with the respective amino acid 
sequences shown in FIG. 1. 

In another case, said small cysteine-hch protein 
corresponding to a hit is a marine worm toxin (VIB). Said hit was identified 
by database searching using a query which comprised Ca-Cp vectors of 
the following hGH amino acid residues: Lys A41; Leu A45; Pro A48; Glu 
A56; Arg A64; Gin A68; Asp A171; Lys A 172; Glu A174; Thr A175; Phe A 
176; Arg A178; lie A179; Arg AS; Leu A9; Asn A12; Leu A15; ArgA16; His 
A1 8; Arg A1 9; Tyr A1 03; Asp A1 1 6; Leu A1 1 7; Glu A1 1 9; and Thr A1 23. 

An amino acid sequence of said hit (VIB) is shown in FIG. 2, 
and an amino acid sequence of proteins engineered by modifying one or 
more amino acids of said hit (VIB01) is shown in FIG. 2. The particular 
amino acid residues of hGH used to modify said hit were selected from 
those forming the agonist-binding functional region of hGH as indicated in 
FIG. 2, Overlap between hGH and said marine worm toxin is shown in 
FIG. 3, which serves to emphasize the ability of the method of the 
invention to identify hits which match cytokine agonist functional regions. 

The peptides designed according to the hGH agonist 
regions consitute candidate hGH agonists. 

In light of the foregoing, it will be understood that the 
present invention contemplates engineered proteins such as according to 
the second-mentioned aspect. 

In one embodiment, the amino acids of said another protein 
present in the engineered protein represent at least one functional region 
of said another protein. 

In another embodiment, the amino acids of said another 
protein present in the engineered protein represent two functional regions 
of said another protein. 

As well as providing amino acids which are non-contiguous 
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in primary sequence, said another protein may also provide amino acids 
which are contiguous in primary sequence. 

In one embodiment, the engineered protein has an amino 
acid sequence selected from the group consisting of SCY01, SCY02, 
SCY03. ERP01. ERP02, ERP03 and VIB01. 

It will also be appreciated that according to both the first and 
second aspects of the invention, homologs of engineered proteins are 
contemplated. A person skilled in the art will realize that conservative 
amino acid substitutions, deletions and additions can be made such that a 
protein will retain a particular function notwithstanding such changes in 
amino acid sequence. All such homologs fall within the scope of the 
invention described herein. 

In order that the present invention may be understood in 
more detail, the skilled person is directed to the following non-limiting 
examples. 

EXAMPLES 



EXAMPLE 1 

Overview of database search strategy 

A schematic description of the computational approach 
developed by the present inventors, program VECTRIX, is shown in FIG. 
4. The first step involves the creation of a library of small cysteine-rich 
proteins. Currently, 344 such proteins (each with less than 70 amino acid 
residues) comprising over 3779 experimentally-derived 3D structures 
have been extracted from the BROOKHAVEN database. However, it 
would also be feasible to construct databases using theoretically derived 
features, such as by homology modelling, threading or other techniques 

known in the field. 

Each structure is simplified, in turn, into Ca-Cp vectors (step 
a), essentially resulting in a database of entries (step b). For the 
purposes of searching the database, each query is in the form of a 
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distance matrix representation of Ca-Cp vectors (step c). However, it is 
possible to represent Ca-Cp vectors by other means, such as dihedral 
angles (5) or and 03 angles, A simple description of these types of 
representations v^ith respect to a Ca-Cp vector pair is shown in FIG. 5. 

The search algorithm compares the distance matrix 
representing the query Ca-Cp vectors with the distance matrix 
representing Ca-Cp vectors of each entry (step d). Comparison of 
topographical similarities was chosen because Ca-Cp vectors are 
common to all amino acid side chains (except glycine), and are 
essentially anchored to the backbone. They therefore represent the initial 
orientation of the amino acid side chain in 3D space, which would 
probably not undergo significant change upon interaction with another 
protein. It is envisaged that the extra atoms of the side chain will provide 
some degree of induced fit during such an interaction. 

Alternative, more restricted approaches would use 
secondary structural features such as a-carbon backbone structures, 
together with suitable algorithms well known in the field (Holm & Sander, 
1994, supra] Alexandrov, 1996, supra; Alexandrov & Fisher, 1996, supra] 
and Oreng, 1994, supra). 

The intermolecular geometric relationship of Ca-Cp vectors 
is compared using the clique-detection algorithm of Ho & Marshall, 1993, 
supra, which identifies hits according to a user-defined number of 
minimum vector components. However, other algorithms well known in the 
art would also be useful in this regard. 

As a result of step d, one or more hits may be identified. If a 
single hit is obtained, no ranking is necessary. If the number of hits is 
small, it may be possible for the skilled person to evaluate and rank each 
hit individually (step e). If, however, the number of hits is large, such 
manual comparison would be more difficult, and an automated process is 
required. 

The most important factor in evaluating and ranking hits is 
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steric integrity, that is, the structural complementarity that each hit 
possesses with regard to the 3D space in which it must reside. For 
example, if the query is in the form of a distance matrix representation of 
Ca-Cp vectors corresponding to the receptor-binding amino acid side- 
chains of a hormone, then a hit must be evaluated in terms of whether it 
would invade the 3D space accessed by the receptor upon binding the 
cytokine. Several algorithms have been developed that are useful for this 
purpose. For example, the FOUNDATION program of Ho & Marshall, 
1993, supra uses various flood filling algorithms to define the 3D space 
occupied by the receptor (as determined from the crystal structure of the 
receptor), and then uses atom-checking routines to establish whether the 
atoms of a hit reside in the binding "cavity" of the receptor. Other 
approaches include placing molecules in a cube containing lattice points 
and checking the van der Waals overlap of each molecule (Allinger, 1972, 
In: Pharmacology and the future of Man. Proceedings of the 5th 
International Conference on Pharmacology pp 57-63). A related method 
involves the calculation of the volume in common and the volume of extra 
space of two molecules (Marshall et al., 1979, The Conformational 
Parameter in Drug Design: The active analog approach. 112 205). 

It is also possible to use simple distance calculations 
between query and hit, after the two have been superimposed, to identify 
if the hit protrudes from the space occupied by the query structure. This is 
an approach the present inventors have implemented in an algorithm 
currently being constructed. 

It is also important to be able to predict any drastic structural 
effects that may result from amino acid sequence changes when 
modifying a hit. This will, in part, be achieved by maximizing the degree of 
amino acid sequence identity of the modified hit with that of the protein (or 
area of the protein) to which the query corresponded. In addition, the 
stereochemical and degree of secondary structure disruption of the 
modified hit can be evaluated using standard algorithms which check 
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protein stereochemistry on an amino acid by amino acid basis. Similarly, 
secondary structure prediction algorithms can be used to evaluate the 
potential for an amino acid sequence modification of a hit to disrupt 
secondary structure. 

Finally, the present inventors plan to utilize molecular 
surfaces to compare various physicochemical properties of a query and 
hit. Charge, electrostatic potential, hydrophobicity, occupancy, and 
hydrogen bonding potential have all been mapped to protein surfaces, 
providing detailed comparisons between proteins. A method for 
quantitating the degree of similarity between two molecular surfaces has 
been developed, in which a gnomonic projection casts the calculated 
values of a given property onto a spherical surface (Dasnzinger & Dean, 
1985, J. Theor. Biol. 116 215). Two such surfaces can then be 
superimposed using pairs of corresponding atoms. This algorithm would 
be very useful for comparing query protein with a hit, to allow fine tuning 
of amino acid residues of the protein corresponding to the hit, and to 
improve steric and electrochemical complementarity. 

Since the database searching algorithms (such as provided 
by the VECTRIX program) applicable to the method of the invention allow 
for the identification of partial hits, there is scope for a skilled person to 
use molecular modelling to identify additional regions on the surface of 
the protein corresponding to the partial hit for mimicking vectors missed in 
the database search. This could involve the use of D-amino acids or non- 
coded amino acids, for example, to achieve better mimicry when 
engineering a mimetic. 

In the following examples, the VECTRIX program has been 
applied to various sample proteins. 

EXAMPLE 2 

High Affinity hGH Antagonists 

Growth hormone (GH) is a pituitary cytokine that regulates 
many growth processes, such as the growth and differentiation of muscle, 
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bone and cartilage cells. The growth cytokine receptor (GHR) consists of 
three domains:- 

(i) an extracellular domain that binds GH; 

(ii) a transmembrane domain; and 

(iii) a cytoplasmic domain involved in eliciting an 
intracellular signal upon cytokine binding. 

Intracellular signalling occurs as a result of dimerization of 
separate GHRs following sequential binding of each receptor to a single 
GH ligand. The first GHR binds to the high affinity site of GH, while the 
second GHR subsequently binds to this complex. In support of this model, 
the crystal structure of this complex shows two identical receptor 
molecules bound to dissimilar sites on a single human GH molecule 
(hGH; De Vos et al., 1992, Science 255 306). 

The high affinity site on hGH is concave and buries 
approximately 1200 of surface area, while the second binding site on 
hGH buries approximately 900 of surface area. A third region 
contributing to the stability of the complex comprises an area of 500 A 
buried by the receptor-receptor interaction. 

The crystal structure also reveals that the actual contact 
areas of both the high affinity and low affinity sites of hGH are buried 
upon complexation with the receptors. 

In developing antagonists of hGH, the present inventors 
have sought to design molecules that mimic the high-affinity binding of 
hGH. Mutagenic studies of the amino acid residues within the high 
affinity binding site showed a dramatic decrease in affinity when certain of 
these amino acid residues were converted to alanine (Cunningham & 
Wells, 1993, 234 554). In this regard, of the 31 amino acid residues with 
buried side-chains, a mere eight (Lys A41; Lys A45; Pro A61; Arg A64; 
Lys A172; Thr A175; Phe A176; and Arg A178) accounted for 
approximately 85% of the total change in binding energy resulting from 
substitution by alanine. A further five residues (Pro A48; Glu A56; Gin 
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A68; Asp A171 ; and lie A179) essentially accounted for the remainder of 
the binding energy. 

The GH residues currently used in the design of antagonists 
are: Asp A171; Lys A172; Glu A174; Thr A175; Phe A176; Arg A178; lie 
A179; Lys A41; Leu A45; Pro A48; Glu A56; Arg A64; and Gin A68. It is 
these amino acid residues of hGH which formed the basis of the query for 
the purposes of database searching. 

Scyllatoxin (pdblscy) was returned as a hit framework that 
matched a maximum of 7 vectors of the hGH high affinity surface. After 
identification of a hit molecule, molecular modelling studies were used to 
optimise the hit resulting in the design of SCY01 , SCY02 and SCY03. 

For example, molecular modelling studies (using INSIGHT 
II) suggested that the C-terminal His of the scyllatoxin-based mimetics 
could be removed as it does not interact with the receptor. This has 
advantages when synthesising the target molecule as His have a 
potential to racemise during peptide assembly. As shown in FIG. 1, the 
mutated framework SCY01 was produced by transfer of 7 matching hGH 
residues, R167, K168, D171, K172, E174, T175 and F176. Similarly 
SCY02 was designed by transfer of hGH residues D171, K172, E174, 
T175, F176, R178 and 1179, however the affinity matured hGH mutation 
E174S was incorporated into SCY02. Similarly, SCY03 incorporated the 
affinity matured hGH mutations D171S and E174S. In this fashion, 
several analogues were designed based on a single hit, that incorporated 
different functional residues and affinity matured residues. 

In addition, molecular modelling techniques were used to 
optimise the amino acid functionality that was transferred to the new 
framework. Using the atomic structure of hGHR, X-SITE (Laskowski et 
al., 1996, supra) was used to predict binding sites for functional groups 
that could be incorporated into the hit peptide. Thus SCY13 was 
developed from SCY02 and SCY03 with the aid of the program X-SITE 
(Laskowski ef a/., 1996, supra), to incorporate novel mutations and 
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auxilliary groups. As shown in FIG. 1, SCY13 possesses a D171Y 
mutation, a T175D mutation and an F176E(Fm) mutation. In addition, an 
N4R mutation in tlie native scyllatoxin sequence was also incorporated 
based on the X-SITE (Laskowski et al., 1996, supra) results. These 
mutations were incorporated to optimise the electrostatic interactions and 
to increase the bound surface area of the modelled SCY-hGHR complex. 

Molecular modelling studies indicated that SCY01 , SCY02 
and SCY03 would bury approximately 700 when bound to hGHR, 
whilst SCY13 would bury approximately 1000 when bound to hGHR. 
The modelling program DelPhi (Honig, B. & Nichoils, A. (1987), 'DelPhi', 
Computer Program, Department of Biochemistry and Molecular 
Biophysics Columbia University) was used to compare the electrostatic 
potential maps of hGH and SCY peptides, with the conclusion that there 
was good complementarity between hGH and SCY peptides. 

The scyllatoxin peptides SCY01-SCY03 and SCY13 (FIG. 
1) were then synthesised using solid phase techniques (M. Schnolzer et 
al., 1992, International Journal of Peptide and Protein Research, 40 180- 
193) purified and oxidised. The products were fully characterised using 
mass spectrometry, high performance liquid chromatography (HPLC) and 
amino acid analysis (AAA). The secondary structure elements of the 
engineered SCY molecules were determined by circular dichroism on 
SCY01 and SCY02 (FIG. 6). The spectra showed a high helical content 
consistent with the native SCY fold. In addition, CD indicated that the 
helical structure was unchanged by addition of helical stabilizing agents 
such as TFE or destabilizing agents such as Guanidine.HCI or 
temperature. This emphasises the favourable chemical characteristics of 
these frameworks. 

In order to determine that the new engineered SCY 
framework mimics the structure of the region of GH used as a query, the 
structure of SCY01 was determined by NMR spectroscopy. As illustrated 
in FIG. 7, we found that their is close conformational overlap (RMS 0.45A) 



wo 00/23474 



PCT/AU99/00914 



35 

between the functional residues on GH and the engineered surface of 
SCY01. Thus validating the process of selecting a target protein, 
simplifying the functional epitope into Ca-Cp vectors, using these as a 
query to identify new frameworks that match the shape of this query, 
synthesising, characterising and folding the new engineered framework. 
The resulting new engineered framework structurally matches that of the 
functional epitope of the target protein, thus validating the design 
process. 

In order to characterise the folding patterns of SCY02 and 
SCY03 NMR experiments were again carried out. However, this time the 
secondary shifts were compared (Wishart et al., J. Biomolecular NMR 5 
67) between the engineered and native SCY. As expected there is little 
or no deviation in the CHa or NHa shifts compared to the native SCY 
molecule indicating the correct fold and disulphide bond connectivity. 

SCY01 was tested for biological function by bioassay using 
the BaF3 cell line, which cells normally respond to GH. The results are 
shown in FIG. 8. SCY01 was assayed at various concentrations to check 
its ability to inhibit BaF3 cell proliferation in response to either 0.5 ng/mL 
hGH, or as a control, 50 Units/mL IL-3. The calculated K, from these 
experiments was approximately 200 |jM, and no inhibitory activity was 
observed with respect to IL-3 induced proliferation. Thus, SCY01 
displayed an inhibitory activity with respect to GH-stimulated proliferation. 
This biological effect suggests that SCY01 is a candidate for further 
investigation with regard to it's mechanism of action. 

The SCY peptides showed extremely good stability in the 
hGH assay buffer as judged by HPLC of the peptide at various time points 
after incubation in the assay buffer for up to 72 hrs. Preliminary studies 
evaluated the bioavailability of SCY01 by exposing it to a variety of 
proteases (trypsin, chymotrypsin and pepsin) and blood serum proteins 
as described in MATERIALS AND METHODS. The results of the blood 
serum stability test are presented in Table 2, and the results of the 
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enzyme stability tests are presented in Table 3. The SCY peptide was 
found to be stable after 24 hrs in each case, while control peptides were 
rapidly digested. Thus emphasising the favourable chemical 
characteristics of disulfide-rich proteins. 



functional epitope of hGH and successfully engineered it onto a new 
disulfide-rich framework. This framework has appealing chemical 
characteristics in terms of bioavailability and bioactivity when compared to 
macromolecular proteins. 
Experimental to Example 2 
Vectrix results 

Number of vectors searched: 15 - R167, K168, D171, 
K172, E174, T175, F176, R178, 1179, K45, P48, E56, R64, Q68. 
Number of different frameworks selected (name.odb code number vector 
matches) : 

Scyllatoxin: pdblscy (7) 
Synthesis 

As described in the General Materials and Methods section. 
The peptides were fully characterized by mass spectrometry, Reverse 
Phase High Performance Liquid chromatography (RP-HPLC) and Amino 
acid analysis (AAA). 
Folding 

The pure reduced peptides SCY 01-03 were folded using 
0.1M solution of NH4HCO3 stirred overnight at RT at a peptide 
concentration of -0.3 pM per ml monitored by HPLC and mass 
spectrometry. The folded peptide was isolated by preparative HPLC. The 
correct disulphide connectivity for SCY01 was determined by full structure 
analysis by NMR. Folding methods using oxidized and reduced 
glutathione in a ratio of 100:10:1 GSH:GSSG: peptide and published 
methods using 5 mM GSSG to 0.5 mM GSH in NaP04 buffer pH 7.4 was 
carried out to give identically folded material. After folding the pure 



In this example the present inventors have taken a 
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peptide an equivalent yield of peptide was obtained by folding the crude 
peptide in exactly the same manner. The oxidation of SCY13 was 
complicated by the Fm group attached to the Glu. SCY13 was oxidised 
using a 30% TFE solution in the presence of 5 mM GSSG to 0.5 mM GSH 
in NaP04 buffer pH 7.4. 
Circular Dichroism (CD) 



connectivities were determined as outlined in the General Materials and 
Methods section. 
Peptide Stability Tests 
Stability in assay buffer 

The SCY peptides showed extremely good stability in the hGH 
assay buffer (RPMI-1640 medium supplemented with 10% (v/v) foetal 
bovine serum (FBS) and 100 units/mL IL3. The peptides were incubated 
at 1 mg/ml solutions in the buffer at 37°C. Samples were removed at 
various time points and HPLC analysis showed the rate of peptide 
decomposition up to 72 hrs. 
Blood Serum 

Blood was collected in heparinised tubes by venapuncture. 
The blood was centrifuged at 5000 rpm for 20 mins and the serum 
decanted. The blood serum was stored at -20°C. A sample of the blood 
serum (900 pL) was incubated with 100 pL of the stock peptide solution (I 
mg/mL in HjO) at 37*'C and aliquots (100 pL) removed at the required 
time. A solution of 50% CH3CN 0.1% TFA was added to precipitate the 
blood serum proteins and centrifuged at 13000 rpm for 5 mins. A sample 
of this solution (100 pL) was analysed by RP-HPLC (Vydac C18 218TP54 
250 X 4.1 mm id 1%/min gradient H2O/CH3CN 0.1% TFA) to detect peptide 
digestion. 



CD was performed as outlined in the General Materials and 



Methods section. 



NMR 



The NMR structure of SCY01 and the CHa and NHa 
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Enzyme Stability Test. 

Trypsin 



To the peptide solution (NH4HCO3, pH 8.3, 0.87 mg/mL) was 



added trypsin (5% w:v). Samples were incubated at SJ'C and aliquots 
removed at 0, 1,3 and 1 8 hrs and analysed by RP-HPLC as above. 
ChvmotrvDsin 

To the stock peptide solution (100 jjL) was added 900 \jL 
NH4HCO3 (pH 8.3). Chymotrypsin was added to 5% w:v and incubated at 
37°C. Aliquots were removed at 0 hr, I hr and 24 hrs and analysed by RP 



HL) and 0.1 M HCI (100 pL) to pH 2.2. Pepsin was added to give a 1 % w:v 
solution and incubated at ST'C. Aliquots were removed at 0 h, 1 h and 24 
hrs and analysed by RP-HPLC. 

EXAMPLE 3 
Growth Hormone -Low Affinity Site 

The low affinity site of growth hormone comprises at least 
12 residues. The Ca-Cp vectors of these 12 residues were used in a 
VECTRIX search. Pdblzdc (ZDC) was returned as the best hit with 9 
search vectors matched at 1 A tolerance. These residues were R8, L9, 
D11, N12, L15, R16, R19, D116andE119. Molecular modelling (Insight 
II) was again used to optimise the hit. it was decided that the R29L 
(matching L9 of hGH) may disrupt the ZDC fold and this mutation was not 
incorporated. Furthermore, additional molecular modelling studies 
suggested that ZDC could match a further 7 residues of hGH. The 
residues that matched (15 residues - RMSd backbone atoms between hit 
and hGH - 1 .46 angstroms) and were incorporated into ZDC05 were, R8, 
D11, N12. L15, R16. R19, Y111. D112 . K115, D116, E118, E119, G120, 
Q122 and T123. As shown in FIGS. 9, the mutated framework ZDC05 was 
produced by transfer of the above 15 matching hGH residues. 



HPLC. 



Pepsin 



To the stock peptide solution (100 pL) was added HjO (800 
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Experimental to Example 3 

Vectrix results. 

Number of vectors searched: 12 - R8, L9, D11, N12, L15, 
R16, R19, D112. L113, D116, E119, T123. 
Number of different matches at 7 or more vector matches: 22 
Number of unique frameworks at 7 or more vector matches: 6 
Number of different frameworks selected (name:pdb code number vector 
matches) : 

Protein A engineered fragment: pdblzdc (9) 

EXAMPLE 4 

Growth Hormone Agonist I 

The agonist site of hGH comprises 25 residues. The Ca- 
Cp vectors of these 25 residues were used in a VECTRIX search. 
Pdblvib (VIB) was returned as the best hit with 8 search vectors matched. 
These residues were N12, R16, R19, D171. K172, E174, T175 and F176. 
Molecular modelling determined that VIB could match a further 9 residues 
of hGH. The residues that matched (17 residues - RMSd backbone 
atoms between hit and hGH - 0.86 angstroms) and incorporated into 
VIB01 were D11, N12, R16, R19, L20 H21, Q22, L23, F25,R167, K168, 
D169, D171, K172, E174, T175 and F176. As shown in FIG. 2, the 
mutated framework VIB01 was produced by transfer of the above 17 
matching hGH residues. 

The modelling program Delphi (Honig & Nicholls, 1987. 
supra) was used to compare the electrostatic potential maps of hGH and 
the mimics, with the conclusion that there was good complimentarity 
between hGH and the mimics. 

With the aid of molecular mechanics forcefield 
minimisations and molecular dynamics, VIB01 was determined to position 
the mutated residues in appropriate spatial orientations to mimic hGH and 

to retain the native fold. 

The VIB peptide (FIG. 2) was synthesised using solid phase 
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techniques (M. Schnolzer a/., International Journal of Peptide and 
Protein Research, supra), purified and oxidised. The product was fully 
characterised using mass spectrometry HPLC and AAA. The secondary 
structure elements of the engineered VIB molecules was checked by 
circular dichroism as illustrated in FIG. 10. The engineered VIB peptide 
had a very stable structure and shows significant helical character in 
aqueous conditions. This would be expected as the native fold is a helix 
loop helix motif. 

In addition, the VECTRIX search identified peptide ERP as 
a hit with 7 search vectors matched. These residues were N12, L15, R16, 
H18, R19, T175 and R178. Molecular modelling determined that ERP 
could match a further 6 residues of hGH. The residues that matched (13 
residues - RMSd backbone atoms between hit and hGH - 1.33 
angstroms) and were incorporated into ERP01 were R8, D11, N12, M14, 
L15, R16. H18, R19, E174, T175, F176, R178 and 1179. As shown In FIG. 
1 1 , the mutated framework ERP01 was produced by transfer of the above 
13 matching hGH residues. 

The modelling program DelPhi (Honig & Nicholls, 1987, 
supra) was used to compare the electrostatic potential maps of hGH and 
the mimics, with the conclusion that there was good complimentarity 
between hGH and the mimics. 

With the aid of molecular mechanics forcefield 
minimisations and molecular dynamics, ERP01 was determined to 
position the mutated residues in appropriate spatial orientations to mimic 
hGH and to retain the native fold. 

ERP02 differed from ERP01 in containing the hGH affinity 
matured mutations E174S, I179T and H18D.The G14F mutation (F176 
mimic) in ERP01 and ERP02 necessitated two major mutations. S6G and 
N1 1G. ERP03 eliminated the G14F mutation and the necessity for these 
mutations giving a less perturbed sequence. 

The ERP peptides 01-03 (FIG. 11) were synthesised using 
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solid phase techniques (M. Schnolzer et al., International Journal of 
Peptide and Protein Research, supra), purified and oxidised. The product 
was fully characterised using mass spectrometry HPLC and AAA. The 
secondary structure elements of the engineered ERP molecules was 
checked by circular dichroism on ERP03 (FIG. 12). This showed a very 
high degree of alpha helical character in agreement with the 3 helical 
bundle structure of the native ERP molecule. 

NMR of ERP01 and ERP03 was carried out to check that 
the 3 disulfide bonds have formed correctly. As expected there is only 
small deviation from the native ERP molecule where the mutations to 
mimic the hGH molecule are made (FIG. 13 for ERP03). There is tittle or 
- no deviation in the CHa or in the NHa shifts compared to the native ERP 
molecule indicative of the correct folding and disulphide bond 
connectivity, once again emphasing the ability to engineer new surfaces 
onto disulfide rich peptides, whilst maintaining the native fold. 
Experimental to Example 4: VIB 
Vectrix results 

Number of vectors searched:25 - R8, L9, N12, LI 5, R16, 
H18, R19, K41, L45, P45, E56, R64, Q68, Y103, D116, L117, E119, 
T123, D171, K172, E174, T175, F176, R178 and 1179. 
Number of different matches 
61292 at minimum 5 vector matches 
Number of unique frameworks 
10 at minimum 7 vectors, 1 at minimum 8 vectors 

Number of different frameworks selected (nameipdb code: # vector 
matches) 

Marine worm neurotoxin : pdblvib (8) 

Peptide Synthesis 

Synthesis of the VIB peptides was as described in the 

General Materials and Methods section. 
Oxidation of the VIB peptides 



wo 00/23474 



PCT/AU99/00914 



42 

The reduced VIB peptides were oxidsied using the methods 
outlined for the ERP peptides with 30% TFE solutions and GSSG: GSH 
oxidation shuttle. 
Circular Dichroism 

CD was performed as outlined in the General Materials and 
Methods section. 

Experimental to Example 4: ERP molecule 

Synthesis of ERP peptides 

As described in the General Materials and Methods section. 
Folding of ERP peptides 

The peptide was dissolved at a low concentration in cold 
water to which was added trifluoroethanoi to 30%. This was cooied at 
4°C for two hours before oxidised and reduced glutathione was added 
(10:100:1/GSSG:GSH:peptide) then 1M NH4HCO3 was added to give a 
0.1 M solution at pH 8.1 , The oxidised peptides were isolated by HPLC. 
Circular Dichroism 

CD was performed as outlined in the General Materials and 

Methods section. 
NMR ofERPOl and 03 

The NMR structure of ERP01 and ERP03 and the Ca-Cp 
and Ca-NHa connectivities were determined as outlined in the General 
Materials and Methods section. 

EXAMPLE 5 

Interleukin 4 (IL-4) 

lL-4 is a four helix bundle cytokine that is the basis of the 
allergic response mechanisms in asthma, rhinitis, conjunctivitis and 
dermatitis. It plays an important role in the induction of immunoglobulins 
through the turning on of B-cells that produce Igm, IgE and IgG's. IL-4 
associates primarily with the lL-4 alpha receptor which accounts for 
nearly the complete binding affinity. The lL-4 receptor complex then 
recruits the common y chain to form the cell signaling heterodimer. 
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The functional epitope of iL-4 that determines the binding 
affinity to the receptor a chain has been identified through mutational 
analysis and from the crystal structure of the recently determined IL-4 and 
the IL-4Ra complex. (Hage et al., 1999, Cell 97 271) The key binding 
event involves mainly charged residues from helix A and C of IL-4 
particularly Arg88 and Glu9. 

The 13 amino acid residues of the binding surface of IL-4 
were used as a query for program VECTRIX. In this case the database to 
be searched contained the structure of GCN4, a 31 residue leucine zipper 
peptide. The GCN4 molecule was identified by the program VECTRIX as 
a hit. It matched 8 vectors of IL-4 (RMS 0.39A). Upon engineering and 
synthesising this molecule containing these 8 amino acids, an IL-4 
agonist is expected with a potency of Kd 106 pM (Dominques et al., 
1999, Nat. Struct. Biol. 6 652) 

An additional molecule ZDC was found that matches 10 
vectors. Upon synthesising the engineered framework it will be folded 
and assayed. 
Vectrix results 

Number of vectors searched: 13: K77, R81, K84. R85, R88, 
N89, W91, T13, E9, 15, R53. F82 ,K12 
Tntal number of different matc hes at 7 or more 
396 

Numhter of unique frameworks 
30 

NnmhRr of different frameworks selected (nameiodb code: # vector 
matches) 

GCN4 peptide: pdblzta (8) 

Protein A fragment (engineered): pdblzdc: (10) 

N.B. No molecule selected in the search matched to Arg53. 

EXAMPLE 6 

CD4 GP120 
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The CD4-GP120 interaction is the primary binding event 
that allows the Human Immunodeficiency Virus (HIV) to enter a cell. The 
crystal structure of CD4 has been known for some time (Wang et al., 
1990, Nature 348 411) but a structure of the CD4 and a highly modified 
GP120 complex was only solved in June 1998 (Kwong et aL, 1998, 
Nature 393 648). It has been known for some time through mutational 
analysis of CD4 (Fleury et ai, 1991. Cell 66 1037) that the key amino 
acids involved in binding to GP120 reside on a loop (CDR1) involving the 
residues 41-47 and the key binding residue Arg59. 

The Ca-Cp vectors of these residues were used in a 
VECTRIX search. Two molecules SCY and PTA (FIG. 14) were identifed 
as potential matches. Both molecules were optimised using a design 
procedure as described above. 

The biological activity of SCY is consistent with the studies 
of Vita et sL, 1998, Biopolymer 47 93. 
Experimental for Example 6 
Vectrix results 

Number of vectors searched: 7: K35, S42, F43, R59, D63, 

Q40. L44. 

Total number of different matches 
At 4 or more matches. 409 
Number of unique frameworks 
116 

Number of different frameworks selected (nameiodb code: # vector 
matches) 

Scorpion neurotoxin: pdb2pta ( 5) 
Scyllatoxin :pdb1scy: (4) 

The scy molecule is only selected in the vectrix search if the absolute 
requirement of a match with Arg59 is removed. 
Synthesis of PTACD4 and SCYCD4 molecules 

As described in the General Materials and Methods section. 
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Oxidation of PTACD4 molecule 

The PTA peptide was oxidised by stirring the peptide 
overnight in 0.1M NH4HCO3 pH 8.1. The oxidised peptide (2 forms) was 
recovered by HPLC. Both folded forms were assayed separately. The 
oxidation of the peptide in different conditions in the presence of 
glutathione failed to yield folded peptide. 
Oxidation of SCY molecule 

The SCY CD4 molecule was oxidised using 5 mM GSSG to 
0.5 mM GSH in NaP04 buffer pH 7.4. The oxidised peptide was purified 
by HPLC. 
Biacore Assay 

GP120 bound to the Biacore chip through NHS coupling 
onto a CM-5 Biacore chip. CD4 is then passed over the GP120 surface 
and the degree of binding assessed through both the on rate K^ssodation and 
the off rate koissociation- CD4 is then equilibrated with the inhibitor ligand 
and passaged over the GP120. Through the BiaCore module the 
degree to which the PTA or SCY ligand disrupts the binding of CD4 to the 
chip is assessed. 

EXAMPLE 7 

Interleukin 6 (IL~6) 

Interleukin 6 (IL-6) is a cytokine that plays an important role 
in the inflamation cascade, neural development, bone metabolism, 
hematopoiesis cell proliferation and immune response mechanisms. 
Interleukin 6 is a 4 helical bundle cytokine that binds to a IL-6 alpha 
receptor and to a common receptor motif GP130. The iL-6R a subunit 
does not play a role in intracellular signalling. This is carried out through 
the ligand dependent dimerisation of the associated GP130 receptor 
molecule. The full receptor complex is believed to be hexameric with two 
units each of IL-6, IL-6R and GP130. The pteiotropic effects of IL-6 is 
thought to come about because of this complex arrangement of the 
heterotrimeric receptor complex. The interaction sites for both the IL-6Ra 
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and GP130 receptors has been well studied through site specific 
mutagenisis of both the receptor molecules and the IL-6 molecule. The 
structure of IL-6 in both solution and crystal forms has been solved and 
the crystal structure of the GP130 receptor has recently been determined. 

The IL-6a receptor binding site on IL-6 (termed Site I) is 
localised primarily to the end of helix D. Two additional sites Site II and III 
are responsible for the two different GP130 receptor molecules binding. 
The two GP130 binding sites are spread over a wide area at the opposite 
end of the molecule to the iL-6 binding site. 

The lL-6 VECTRIX search described herein pertains only to 
the ll-6a receptor interaction. It does not relate to the GP130 receptor 
interaction or the multi receptor interactions (though the VECTRIX search 
has been carried out for these two sites II and III as well). No modeling of 
the IL-6 residues to any of the hit frameworks has been carried out. A few 
examples of possible framework targets are listed below. 
Vectrix results 

Number of vectors searched: 21 Subsetl (Site I) 8 vectors: 
Subset 2 (Site II and III) 13 vectors. 

Number of different matches at 8 and above matches for Site I 
179 

Number of unique frameworks 
29 

Number of different frameworks selected fname:p db code: # vector 
matches) 

Protein A fragment (engineered): pdblzdc: (9) 
Moloney murine leukemia virus fragment: Pdblmof: (10) 
Scyllatoxin: pdblscy: (8) 

EXAMPLE 8 

G-CSF 

Granulocyte Colony Stimulating Factor (G-CSF) is part of 
the class of 4 helical bundle cytokine or growth factors. It is involved in 
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the promotion of cell proliferation and differentiation leading to the 
production of mature neutrophils. Its ability to replenish these neutrophils 
in-vivo makes it an attractive drug target, G-CSF functions through 
receptor dimerisation of the CSF receptor. There has been alanine 
scanning mutagenisis carried out on G-CSF to identify the key residues 
involved in receptor recognition. The crystal structure of G-CSF has been 
available since 1993 (Hill et aL, 1993, Proceedings of the National 
Acadamy of Science USA 90 5167) and the NMR structure since 1994 
(Zink et al., 1994, Biochemistry 33 8453). 

The VECTRIX search was done with an absolute 
requirement for a vector matching the critical amino acid Phe 145. 
However, relatively few hits resulted, presumably due to the restriction of 
every hit matching the Phe 145 vector. Alterations of this absolute 
requirement and refinement of the VECTRIX search will lead to a larger 
number of hits. 
Experimental to Example 8 
Vectrix results 

Number of vectors searched: 1 8 
Number of different matches 
338 

Number of unique frameworks 
115 

Number of different frameworks selected (name:od h code: # vector 
matches) 

Further refinement of the vectrix search is needed before a 
selection as to probable tigand frameworks. 

nf=NERAL materials; & METHODS 

Design 

Database searching and all design steps were carried out 
on either an R10000 or R12000 SGI Octane workstation. Database 
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searching was performed with VECTRIX. Visualisation and peptide 
mutations and modifications were performed using Software programs 
from Biosym/MSI of San Diego-lnsightll and Biopolymer respectively. 
Analysis of electrostatic potential character of the molecules was carried 
out using Biosym/MSI of San Diego-DelPhi, while surface area 
calculations were performed with Naccess (Hubbard & Thornton, 1993, 
'NACCESS', Computer Program, Department of Biochemistry and 
Molecular Biology, University College London) Molecular mechanics 
minimisations and molecular dynamics calculations were performed on 
the mutated frameworks to determine whether the native fold was 
retained. Programs such as X-SITE (Laskowski a/., 1996 Journal of 
Molecular Biology, p1 75-201) were used to add additional functionality to 
the mutated peptides. 
Chemicals and Reagents 

Trifluoroacetic acid (TFA) dichloromethane (DCM) 
dimethylformamide (DMF) and disopropylethylamine (DIEA) were from 
Auspep (Melbourne Australia). 2-(1H-benzotriazol-1-yl)-1, 1,3,3- 
tetramethyl uronium hexafluorophosphate (HBTU) was from Richelieu 
Biotechnologies (St. Hyacinth, Quebec, Canada). Acetonitrile was from 
BDH Laboratory Supplies (Poole, U.K.), Diethyl ether from Fluka 
Biochemicals (Melbourne) and 2- mercaptoethanol from Sigma (St. Louis 
Mo, USA). Trifluoroethanol from (Aldrich. Milwaukee, Wl, USA). HF 
was purchased from Boc Gases (Brisbane, Australia) The following Na- 
Boc protected L-amino acids Ala, Gly, lie. Leu, Phe, Pro, Val, Arg (Tos), 
Asp(OChx), Asn(Xanth), Glu (OChx), His(DNP), Ser(Bzl), Thr(Bzl), 
Tyr(2BrZ) were purchased either from NovaBiochem (La Jolla, CA, USA) 
or Bachem (Switzerland). MBHA polystyrene resin was purchased from 
Peptide Institute (Kyoto, Japan). 
HPLC Methods 

Analytical and preparative HPLC was carried out using a 
Waters HPLC system comprised of model 600 solvent delivery system 
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600E controller and model 484 detector. Vydac C18 and C4 columns 
analytical (4.6 X 250 mm id) at a flow rate of 1 ml/min and semi 
preparative (10 X 250 mm id) at a flow rate of 3 ml/min and preparative 
(22 X 250 mm id) at a flow rate of 8 ml/min were used. All peptides were 
purified using linear gradients of 0.1% aqueous TFA (solvent A) , 90% 
aqueous to acetonitrile 0.09% TFA (solvent B) 
Peptide Synthesis 

Peptides were synthesized using the rapid manual HBTU in- 
situ neutralization synthesis techniques (Schnolzer et a/., 1992, supra) on 
a modified ABI 430A peptide synthesizer (Alewood et ai, 1997, supra). 
The peptide was synthesized on a MBHA resin on a 0.2 mmol scale 
using 0.79 mmol/g NH2 substituted resin. Each amino acid was double 
coupled using 2 mmol AA 0.48l\/l HBTU (4 ml) and 1 ml DIEA for 10 min 
each coupling. The Boc group was removed by 2 x 1 min treatments of 
TFA with 1 min DMF flow washes of the resin. 

At the completion of the synthesis the His(DNP) group, if 
present in a particular sequence, was removed using 20% 
mercaptoethanol in 10% DIEA/DMF solution 3 x 30 min treatments. 
Peptide resin was cleaved using HF with p -cresol and p -thiocresol 
(90:8:2) as scavengers at -5 to 0°C for 2 hrs. If Trp(CHO) is present in a 
sequence, it is removed by treatment with ethanolamine. The HF was 
removed in vacuo, the peptides triturated with cold diethyl ether (3 x 50 
ml) the precipitated peptide collected then dissolved in 50% acetonitrile 
with 0.1%TFA to give the crude peptide. The crude peptide (-80 mg lots) 
was purified by RPHPLC and fractions collected and analysed by 
analytical RPHPLC and ESMS. Fractions containing the purified peptide 
were combined and lyophilised. 

Mass spectral data were collected using a Perkin Elmer 
Sciex (Toronto. Canada) API III Biomolecular Mass Analyzer ion-spray 
mass spectrometer equipped with an ABI MOB solvent delivery system. 
Raw data was analyzed using the program MassSpec (Perkin Elmer 
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Sciex). Calculated masses were obtained using the program MacProMass 
(Sunil Vemuri & Terry Lee, City of Hope, Durate, CA). 
Ultraviolet Circular Dichroism (CD) 

Far UV-CD spectra were recorded using a Jasco 710 CD 
spectrometer with associated based PC software. CD spectra are 
presented as a plot of mean molar ellipticity per residue [9] deg cm^ dmol" 
verse wavelength in 0.1 nm increments. The digitised data was ploted 
using the Kalidagraph program on a Macintosh. All peptide 
concentrations were determined by quantitative amino acid analysis. 
- H NMR spectroscopy 

All NMR experiments were recorded on a Bruker ARX 500 
spectrometer equipped with a Z-gradient unit. Peptide concentration was 
approximately 3 mM in 95% H20/5% D^O (T = 293K). Spectra recorded 
included NOESY (Kumar et al., 1980, Biochem. Biophys. Res. Comm. 95 
1; Jeener et al., 1979, 71 4546) with a mixing time of 400 millisecond, and 
TOCSY (Bax & Davis, 1985, 65 355) with a mixing time of 85 millisecond. 
Spectra were run over 5550 Hz with 4K data points, 512 FlDs, 32-64 
scans and a recycle delay of Is. The solvent was suppressed using the 
WATERGATE sequence (Piotto et al., J. Biomol. NMR, 1992, 2 661) 
Spectra were processed using UXNMR. FIDS were multiplied by a 
polynomial function and apodised using a 90° shifted sine-bell function in 
both dimensions prior to Fourier transformation. Baseline correction using 
a S'*" order polynomial was applied and chemical shift values were 
referenced externally to DSS at 0.00 ppm. The random coil H chemical 
shift values of Wishart et al., 1995, J. Biomol. NMR 6 135, were used. 
Spectra were assigned using the methods of Wuthrich et al., 1986, NMR 
of Proteins and Nucleic Acids. Wiley-lnterscience NY. 
Growth Hormone Proliferation Assay 

BaF-B03 cells (a pro B ceil line) that stably express the 
human Growth Hormone Receptor (hGHR) are used in this assay since 
they are able to elicit a GH-specific response at concentrations as low as 
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0.1 ng/mL hGH (4.54 pM). These cells also endogenously express the 
IL3 receptor and require IL3 or GM-CSF to survive in culture. The assay 
is based on that of Mossman, 1983, J. Immunol. Meth. 65 55, and 
involves the following procedure:- 

(i) culture cells in RPMI-1640 medium supplemented 
with 10% (v/v) foetal bovine serum (FBS) and 100 
units/mL IL3 under 5% COj at 37°C. Allow the 
culture to reach mid-log growth phase; 

(ii) centrifuge cells at 500 xg and wash with PBS to 
remove IL3 from the culture medium. Repeat the 
centrifugation and resuspend in 1 mL of RPMI-1640 
plus 0.5% (v/v) FBS. Count cells and dilute to a 
concentration of 8 x 10^ cells/mL in same media; 

(iii) from a constantly stirred suspension, add 50 |jL of 
cells to each well of two 96 well plates; 

(iv) prepare stock solutions of the mimetic to be tested at 
various concentrations such that the final 
concentration ranges from 100 nM to 100 mM made 
up in 0.5% FBS media (final volume is 150 pL, 
therefore stocks should be 3 times final concentration 
required). Add 50 of these solutions to cells in 
sextuplicate (i.e. A1 to A6 are identical etc.); 

(v) prepare a stock solution (3 times) of hGH such that 
the final concentration is 0.5 ng/mL and add 50 mL to 
each well of one plate. Include one row as a negative 
control with no cytokine; 

(vi) prepare a stock solution (3 times) of lL-3 such that 
the final concentration is 50 units/mL and add 50 ^L 
to each well of the other plate. Include one row as a 
negative control with no cytokine; 

(vii) incubate plates with no lids (to prevent uneven 
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evaporation rates) in a vented humidified box under 
the abovementioned incubation conditions. Allow 
incubation to continue for 24 hrs; 

(viii) add 50 pL of 4 mg/mL MTT (3-[4,5-dinnethylthiazol-2- 
yl]-2,5-diphenyltetrazolium bromide) to each well and 
incubate for a further 3 hrs; 

(ix) to stop assay, remove from incubator and iyse cells 
by adding 120 pL of isopropanol and triturating for 
several seconds per well or until cells are clearly 
lysed. Allow plate to rest in the dark for 5 minutes 
before reading; 

(x) read plate at 595 nm on a microplate reader. Values 
obtained are directly proportional to cell number (as 
measured by mitochondrial dehydrogenase levels). 

CONCLUSIONS 
These studies have shown that by engineering small, 
cysteine-rich proteins, a stable mimetic with high bioavailability can be 
made with desired biological characteristics, in this case the ability to 
antagonize the biological action of hGH. Furthermore, the database 
searching strategy of the present invention has shown that suitable 
"frameworks" for engineering mimetics can be identified according to 
aspects of structure which are shared with a sample protein that 
possesses a function of interest. The framework so identified will 
advantageously have increased stability compared to the sample protein. 
Finally, frameworks identified by the method of the invention may be 
suitable for further amino acid sequence modification so as to impart a 
function of the sample protein, or a function antagonistic thereto. 

The present invention therefore provides a new strategy for 
the engineering of proteins, which strategy is particularly applicable to the 
engineering of mimetics which may constitute the next generation of 
therapeutics. 
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It will be understood by the skilled person that the invention 
is not limited to the particular embodiments described in detail herein, but 
also includes other embodiments consistent with the broad spririt and 
scope of the invention. 
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TABLE 2 Blood serum stability test results 











Control peptide 


partially digested after 3 
mins 


fully digested 




SCY01 


stable 


stable 


stable 



TABLE 3 Enzyme stability test results 







• Jyj^c 


trypsin 


Digested in 1 hr 


stable over 1 8 hrs 


a-chymotrypsin 


Digested in 1 hr 


Stable over 1 8 hrs 


pepsin 


Digested in 1 hr 


Stable over 1 8 hrs 
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Scheme A: An overview of the fiinctioning of the program Vectrix. 



Vectrix query.file database.file steric.file min_match 

query.file contains the xyz coordinate and the tolerance for 
each atom in the query. Define subset of atoms. 

databascfile contains a Ust of PDB files which constitute the 
database 

stericfile contains the xyz coordinate of the grid points 
defining the receptor or ligand space 

min_match is an integer defining the minimum number of 
match which is considered as a hit 



Open query file and calculate Ca-cp distance matrix 






r 


No more 




Open each database entry and calculate 
Ca-cp distance matrix 


entry 


^ ► 


^ Exit 


Not a hit 




r 




< 


Clique detection 




^ Hit 



Superimpose hit onto query 



Count the number of steric invasion 
and the number of matches within the 
defined subsets 



Output the result 
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Scheme B 



Calculate QUERY distance 
matrix of Ca-Cp vectors 



± 

Open 3D-database file 



Process In turn, each 3D DATABASE structure 

1. Calculate Structure-distance matrix of Ca-Cp vectors 



2. Process In turn, each structural vector 



Anchor, In turn, each ouerv vector at this structural vector 



1) Select Candidate vectors for the other query 
positions bjased upon distance constraints. 

2) Systematically evaluate each combination of 
query candidates to see of a possible query solution 

exists based upon the minimum number of query 
vectors required. 

3) If a possible vector exists, verify that all query 
positions are progressively linked. 

4) If solution Is valid, perform translations/rotations 
upon all vectors to fit query solution and calculate 
root mepn square^ difference, 

5) Write out atomic coordinates. 

6) Score Hit. 



wo 00/23474 



PCT/AU99/00914 



58 

Scheme C 



Automatic weekly job 



Find likely new frameworks 
Creates new sequentially named diectory for candidate structures (dirX) 



Hnds flics created in last 7 days and filters the entries ^^dth small peptides 
(<70) and disulfides. Files are copied to directory dirX and clipped to one 
conformer. Source log file is created for Insightn 



i. j 



2 
3 
4 



Manual visual check 



Files are viewed in Insightfl and reject files discarded 



i 



Database build 



After manual visual check, run script/dbjbuild in Search Database which 
performs the following functions: 

Finds useful hits in directory dirX and copies these files from pdb 
database to Search Database. 

Files are then split and renamed 

The database is cleaned up 

The original pdb files of the hits in pdb database arc linked to Unique 
Database to produce an easily accessible record of all entries in the Search 
Database for characterisation etc. 

The new entry list is appended to the MASTER_DB_LIST with the date. 
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